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Abstract 

Frigyes Karinthy, in his 1929 short story "Lancszemek" 
("Chains") suggested that any two persons are distanced by 
at most six friendship links [j] Stanley Milgram in his famous 
experiment [201 |2"3"] challenged people to route postcards to a 
fixed recipient by passing them only through direct acquain- 
tances. The average number of intermediaries on the path 
of the postcards lay between 4.4 and 5.7, depending on the 
sample of people chosen. 

We report the results of the first world-scale social-network 
graph-distance computations, using the entire Facebook net- 
work of active users (s=s 721 million users, 69 billion friend- 
ship links). The average distance we observe is 4.74, cor- 
responding to 3.74 intermediaries or "degrees of separation", 
showing that the world is even smaller than we expected, and 
prompting the title of this paper. More generally, we study 
the distance distribution of Facebook and of some interest- 
ing geographic subgraphs, looking also at their evolution over 
time. 

The networks we are able to explore are almost two orders 
of magnitude larger than those analysed in the previous liter- 
ature. We report detailed statistical metadata showing that 
our measurements (which rely on probabilistic algorithms) 
are very accurate. 



y-i 1 Introduction 

X 

At the 20th World-Wide Web Conference, in Hyderabad, In- 
dia, one of the authors (Sebastiano) presented a new tool for 
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The exact wording of the story is slightly ambiguous: "He bet us 
that, using no more than five individuals, one of whom is a personal ac- 
quaintance, he could contact the selected individual [. . . ]". It is not com- 
pletely clear whether the selected individual is part of the five, so this 
could actually allude to distance five or six in the language of graph the- 
ory, but the "six degrees of separation" phrase stuck after John Guare's 
1990 eponymous play. Following Milgram's definition and Guare's inter- 
pretation (see further on), we will assume that "degrees of separation" 
is the same as "distance minus one", where "distance" is the usual path 
length (the number of arcs in the path). 



studying the distance distribution of very large graphs: Hy- 
perANF [3 . Building on previous graph compression jl] work 
and on the idea of diffusive computation pioneered in |21j . 
the new tool made it possible to accurately study the dis- 
tance distribution of graphs orders of magnitude larger than 
it was previously possible. 

One of the goals in studying the distance distribution is the 
identification of interesting statistical parameters that can 
be used to tell proper social networks from other complex 
networks, such as web graphs. More generally, the distance 
distribution is one interesting global feature that makes it 
possible to reject probabilistic models even when they match 
local features such as the in-degree distribution. 

In particular, earlier work had shown that the spicQ 
which measures the dispersion of the distance distribution, 
appeared to be smaller than 1 (underdispersion) for so- 
cial networks, but larger than one (overdispersion) for web 
graphs [3 J. Hence, during the talk, one of the main open 
questions was "What is the spid of Facebook?". 

Lars Backstrom happened to listen to the talk, and sug- 
gested a collaboration studying the Facebook graph. This 
was of course an extremely intriguing possibility: beside test- 
ing the "spid hypothesis", computing the distance distribution 
of the Facebook graph would have been the largest Milgram- 
like |20| experiment ever performed, orders of magnitudes 
larger than previous attempts (during our experiments Face- 
book has « 721 million active users and ss 69 billion friend- 
ship links). 

This paper reports our findings in studying the distance 
distribution of the largest electronic social network ever cre- 
ated. That world is smaller than we thought: the average 
distance of the current Facebook graph is 4.74. Moreover, the 
spid of the graph is just 0.09, corroborating the conjecture [3] 
that proper social networks have a spid well below one. We 
also observe, contrary to previous literature analysing graphs 
orders of magnitude smaller, both a stabilisation of the aver- 
age distance over time, and that the density of the Facebook 
graph over time does not neatly fit previous models. 

Towards a deeper understanding of the structure of the 
Facebook graph, we also apply recent compression techniques 



2 The spid (shortest-paths index of dispersion) is the variance-to- 
mean ratio of the distance distribution. 
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that exploit the underlying cluster structure of the graph to 
increase locality. The results obtained suggests the existence 
of overlapping clusters similar to those observed in other so- 
cial networks. 

Rcplicability of scientific results is important. While for 
obvious nondisclosure reasons we cannot release to the pub- 
lic the actual 30 graphs that have been studied in this paper, 
we distribute freely the derived data upon which the tables 
and figures of this papers have been built, that is, the Web- 
Graph properties, which contain structural information about 
the graphs, and the probabilistic estimations of their neigh- 
bourhood functions (see below) that have been used to study 
their distance distributions. The software used in this paper 
is distributed under the (L)GPL General Public License]^] 

2 Related work 

The most obvious precursor of our work is Milgram's cele- 
brated "small world" experiment, described first in j5U| and 
later with more details in |23| : Milgram's works were actually 
following a stream of research started in sociology and psy- 
chology in the late 50s [12_ . In his experiment, Milgram aimed 
at answering the following question (in his words) : "given two 
individuals selected randomly from the population, what is 
the probability that the minimum number of intermediaries 
required to link them is 0, 1, 2, ... , fc?". 

The technique Milgram used (inspired by [22]) was the fol- 
lowing: he selected 296 volunteers (the starting population) 
and asked them to dispatch a message to a specific individ- 
ual (the target person), a stockholder living in Sharon, MA, 
a suburb of Boston, and working in Boston. The message 
could not be sent directly to the target person (unless the 
sender knew him personally), but could only be mailed to 
a personal acquaintance who is more likely than the sender 
to know the target person. The starting population was se- 
lected as follows: 100 of them were people living in Boston, 
100 were Nebraska stockholders (i.e., people living far from 
the target but sharing with him their profession) and 96 were 
Nebraska inhabitants chosen at random. 

In a nutshell, the results obtained from Milgram's exper- 
iments were the following: only 64 chains (22%) were com- 
pleted (i.e., they reached the target); the average number of 
intermediaries in these chains was 5.2, with a marked dif- 
ference between the Boston group (4.4) and the rest of the 
starting population, whereas the difference between the two 
other subpopulations was not statistically significant; at the 
other end of the spectrum, the random (and essentially clue- 
less) group from Nebraska needed 5.7 intermediaries on av- 
erage (i.e., rounding up, "six degrees of separation"). The 
main conclusions outlined in Milgram's paper were that the 
average path length is small, much smaller than expected, 

3 See http : // {webgraph, law} . dsi .unimi . it/. 



and that geographic location seems to have an impact on the 
average length whereas other information (e.g., profession) 
does not. 

There is of course a fundamental difference between our ex- 
periment and what Milgram did: Milgram was measuring the 
average length of a routing path on a social network, which is 
of course an upper bound on the average distance (as the peo- 
ple involved in the experiment were not necessarily sending 
the postcard to an acquaintance on a shortest path to the 
destination)]^] In a sense, the results he obtained are even 
more striking, because not only do they prove that the world 
is small, but that the actors living in the small world are able 
to exploit its smallness. It should be remarked, however, that 
in [20 , 23 the purpose of the authors is to estimate the num- 
ber of intermediaries: the postcards are just a tool, and the 
details of the paths they follow are studied only as an artifact 
of the measurement process. The interest in efficient routing 
lies more in the eye of the beholder (e.g., the computer scien- 
tist) than in Milgram's: with at his disposal an actual large 
database of friendship links and algorithms like the ones we 
use, he would have dispensed with the postcards altogether. 

Incidentally, there have been some attempts to repro- 
duce Milgram-like routing experiments on various large net- 
works |T51 H4l ITT] . but the results in this direction are still 
very preliminary because notions such as identity, knowledge 
or routing are still poorly understood in social networks. 

We limited ourselves to the part of Milgram's experiment 
that is more clearly defined, that is, the measurement of 
shortest paths. The largest experiment similar to the ones 
presented here that we are aware of is [TS] , where the authors 
considered a communication graph with 180 million nodes 
and 1.3 billion edges extracted from a snapshot of the Mi- 
crosoft Messenger network; they find an average distance of 
6.6 (i.e., 5.6 intermediaries; again, rounding up, six degrees of 
separation). Note, however, that the communication graph 
in |15j has an edge between two persons only if they com- 
municated during a specific one- month observation period, 
and thus does not take into account friendship links through 
which no communication was detected. 

The authors of |24| . instead, study the distance distribu- 
tion of some small-sized social networks. In both cases the 
networks were undirected and small enough (by at least two 
orders of magnitude) to be accessed efficiently in a random 
fashion, so the authors used sampling techniques. We re- 
mark, however, that sampling is not easily applicable to di- 

4 Incidentally, this observation is at the basis of one of the most in- 
tense monologues in Guare's play: Ouisa, unable to locate Paul, the 
con man who convinced them he is the son of Sidney Poitier, says "I 
read somewhere that everybody on this planet is separated by only six 
other people. Six degrees of separation. Between us and everybody else 
on this planet. [. . . ] But to find the right six people." Note that this 
fragment of the monologue clearly shows that Guare's interpretation of 
the "six degree of separation" idea is equivalent to distance seven in the 
graph-theoretical sense. 
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rected networks (such as Twitter) that are not strongly con- 
nected, whereas our techniques would still work (for some 
details about the applicability of sampling, see [8J). 

Analysing the evolution of social networks in time is also 
a lively trend of research. Leskovec, Kleinberg and Faloutsos 
observe in |16j that the average degree of complex networks 
increase over time while the effective diameter shrinks. Their 
experiments are conducted on a much smaller scale (their 
largest graph has 4 millions of nodes and 16 millions of arcs), 
but it is interesting that the phenomena observed seems quite 
consistent. Probably the most controversial point is the hy- 
pothesis that the number of edges m(t) at time t is related 
to the number of nodes n(t) by the following relation: 

m(t) cx n(t) a , 

where a is a fixed exponent usually lying in the interval 
(1 . . 2). We will discuss this hypothesis in light of our find- 
ings. 



3 Definitions and Tools 

The neighbourhood function N G (t) of a graph G returns for 
each t G N the number of pairs of nodes (x,y) such that 
y is reachable from x in at most t steps. It provides data 
about how fast the "average ball" around each node expands. 
From the neighbourhood function it is possible to derive the 
distance distribution (between reachable pairs), which gives 
for each t the fraction of reachable pairs at distance exactly 
t. 

In this paper we use Hyper ANF, a diffusion-based algo- 
rithm (building on ANF [3T]) that is able to approximate 
quickly the neighbourhood function of very large graphs; our 
implementation uses, in turn, WebGraph (3] to represent in 
a compressed but quickly accessible form the graphs to be 
analysed. 

Hyper ANF is based on the observation (made in [21) that 
B(x,r), the ball of radius r around node x, satisfies 

B(x,r) = |J B(y,r-l)U{x}. 

x^ry 

Since B{x, 0) = { x }, we can compute each B{x, r) incremen- 
tally using sequential scans of the graph (i.e., scans in which 
we go in turn through the successor list of each node). The 
obvious problem is that during the scan we need to access 
randomly the sets B(x,r — 1) (the sets B(x,r) can be just 
saved on disk on a update file and reloaded later) . 

The space needed for such sets would be too large to be 
kept in main memory. However, Hyper ANF represents these 
sets in an approximate way, using HyperLogLog counters |10| . 
which should be thought as dictionaries that can answer reli- 
ably just questions about size. Each such counter is made of 



a number of small (in our case, 5-bit) registers. In a nutshell, 
a register keeps track of the maximum number M of trail- 
ing zeroes of the values of a good hash function applied to 
the elements of a sequence of nodes: the number of distinct 
elements in the sequence is then proportional to 2 M . A tech- 
nique called stochastic averaging is used to divide the stream 
into a number of substreams, each analysed by a different reg- 
ister. The result is then computed by aggregating suitably 
the estimation from each register (see [TD] for details). 

The main performance challenge to solve is how to quickly 
compute the HyperLogLog counter associated to a union of 
balls, each represented, in turn, by a HyperLogLog counter: 
Hyper ANF uses an algorithm based on word-level parallelism 
that makes the computation very fast, and a carefully engi- 
neered implementation exploits multicore architectures with 
a linear speedup in the number of cores. 

Another important feature of HyperANF is that it uses 
a systolic approach to avoid recomputing balls that do not 
change during an iteration. This approach is fundamental to 
be able to compute the entire distance distribution, avoiding 
the arbitrary termination conditions used by previous ap- 
proaches, which have no provable accuracy (see [3] for an 
example) . 

3.1 Theoretical error bounds 

The result of a run of HyperANF at the t-th iteration is an 
estimation of the neighbourhood function in t. We can see it 
as a random variable 



N G (t) = X i 



0<i<n 

where each X^t is the HyperLogLog counter that counts 
nodes reached by node i in t steps (n is the number of nodes of 
the graph). When m registers per counter are used, each X^t 
has a guaranteed relative standard deviation r\ m < 1.06/ \pm. 

It is shown in [3_ that the output N G (t) of HyperANF 
at the t-th iteration is an asymptotically almost unbiased 
estimator of Na{t), that is 



E[N G (t)} 
N G (t) 



1 + <5i(n) + o(l) for n 



oo, 



where Si is the same as in [10J [Theorem 1] (and < 
5 • 10 -5 as soon as m > 16). Moreover, N G (t) has a relative 
standard deviation not greater than that of the Xj's, that is 



Var[JV G (t)] 



< 



'h, 



In particular, our runs used m — 64 (rj m = 0.1325) for all 
graphs except for the two largest Facebook graphs, where we 
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used m = 32 (j] m = 0.187). Runs were repeated so to obtain 
a uniform relative standard deviation for all graphs. 

Unfortunately, the relative error for the neighbourhood 
function becomes an absolute error for the distance distri- 
bution. Thus, the theoretical bounds one obtains for the 
moments of the distance distribution are quite ugly. Actu- 
ally, the simple act of dividing the neighbourhood function 
values by the last value to obtain the cumulative distribution 
function is nonlinear, and introduces bias in the estimation. 

To reduce bias and provide estimates of the standard er- 
ror of our measurements, we use the jackknife [jj], a classical 
nonparametric method for evaluating arbitrary statistics on 
a data sample, which turns out to be very effective in prac- 
tice [3]- 

4 Experiments 

The graphs analysed in this paper are graphs of Facebook 
users who were active in May of 2011; an active user is one 
who has logged in within the last 28 days. The decision to 
restrict our study to active users allows us to eliminate ac- 
counts that have been abandoned in early stages of creation, 
and focus on accounts that plausibly represent actual indi- 
viduals. In accordance with Facebook's data retention poli- 
cies, historical user activity records are not retained, and his- 
torical graphs for each year were constructed by considering 
currently active users that were registered on January 1st of 
that year, along with those friendship edges that were formed 
prior that that date. The "current" graph is simply the graph 
of active users at the time when the experiments were per- 
formed (May 2011). The graph predates the existence of 
Facebook "subscriptions", a directed relationship feature in- 
troduced in August 2011, and also does not include "pages" 
(such as celebrities) that people may "like". For standard 
user accounts on Facebook there is a limit of 5 000 possible 
friends. 

We decided to extend our experiments in two directions: 
regional and temporal. We thus analyse the entire Facebook 
graph (f b), the USA subgraph (us), the Italian subgraph (it) 
and the Swedish (se) subgraph. We also analysed a com- 
bination of the Italian and Swedish graph (itse) to check 
whether combining two regional but distant networks could 
significantly change the average distance, in the same spirit 
as in the original Milgram's experiment]^] For each graph we 
compute the distance distribution from 2007 up to today by 
performing several Hyper ANF runs, obtaining an estimate 
of values of neighbourhood function with relative standard 
deviation at most 5.8%: in several cases, however, we per- 

5 To establish geographic location, we use the users' current geo-IP 
location; this means, for example, that the users in the it-2007 graph 
are users who are today in Italy and were on Facebook on January 1, 
2007 (most probably, American college students then living in Italy). 



formed more runs, obtaining a higher precision. We report 
the jackknife [9] estimate of derived values (such as average 
distances) and the associated estimation of the standard er- 
ror. 

4.1 Setup 

The computations were performed on a 24-core machine with 
72 GiB of memory and 1 TiB of disk spacej^ The first task 
was to import the Facebook graph(s) into a compressed form 
for WebGraph [4 , so that the multiple scans required by 
Hyper ANF 's diffusive process could be carried out relatively 
quickly. This part required some massaging of Facebook's 
internal IDs into a contiguous numbering: the resulting cur- 
rent f b graph (the largest we analysed) was compressed to 
345 GB at 20 bits per arc, which is 86% of the information- 
theoretical lower bound (log I J bits, there n is the number 
of nodes and m the number of arcs) F] Whichever coding we 
choose, for half of the possible graphs with n nodes and m 
arcs we need at least [log ( n )J bits per graph: the purpose of 
compression is precisely to choose the coding so to represent 
interesting graphs in a smaller space than that required by 
the bound. 

To understand what is happening, we recall that Web- 
Graph uses the BV compression scheme [J, which applies 
three intertwined techniques to the successor list of a node: 

• successors are (partially) copied from previous nodes 
within a small window, if successors lists are similar 
enough; 

• successors are intervalised, that is, represented by a left 
extreme and a length, if significant contiguous successor 
sequences appear; 

• successors are gap-compressed if they pass the previous 
phases: instead of storing the actual successor list, we 
store the differences of consecutive successors (in increas- 
ing order) using instantaneous codes. 

Thus, a graph compresses well when it exhibits similarity 
(nodes with near indices have similar successor lists) and lo- 
cality (successor lists have small gaps). 

The better-than-random result above (usually, randomly 
permuted graphs compressed with WebGraph occupy 10 — 
20% more space than the lower bound) has most likely been 
induced by the renumbering process, as in the original stream 
of arcs all arcs going out from a node appeared consecutively; 

6 We remark that the commercial value of such hardware is of the 
order of a few thousand dollars. 

7 Note that we measure compression with respect to the lower bound 
on arcs, as WebGraph stores directed graphs; however, with the addi- 
tional knowledge that the graph is undirected, the lower bound should 
be applied to edges, thus doubling, in practice, the number of bits used. 



4 





Before LLP — •— 




After LLP 







5 10 15 20 25 30 

Logarithm of successor gaps 

Figure 1: The change in distribution of the logarithm of 
the gaps between successors when the current f b graph is 
permuted by layered label propagation. See also Table [T] 

as a consequence, the renumbering process assigned consec- 
utive labels to all yet-unseen successors (e.g., in the initial 
stages successors were labelled contiguously), inducing some 
locality. 

It is also possible that the "natural" order for Facebook 
(essentially, join order) gives rise to some improvement over 
the information-theoretical lower bound because users often 
join the network at around the same time as several of their 
friends, which causes a certain amount of locality and simi- 
larity, as circle of friends have several friends in common. 

We were interested in the first place to establish whether 
more locality could be induced by suitably permuting the 
graph using layered labelled propagation [2] (LLP). This ap- 
proach (which computes several clusterings with different lev- 
els of granularity and combines them to sort the nodes of a 
graph so to increase its locality and similarity) has recently 
led to the best compression ratios for social networks when 
combined with the BV compression scheme. An increase in 
compression means that we were able to partly understand 
the cluster structure of the graph. 

We remark that each of the clusterings required by LLP is 
in itself a tour de force, as the graphs we analyse are almost 
two orders of magnitude larger than any network used for 
experiments in the literature on graph clustering. Indeed, 
applying LLP to the current Facebook graph required ten 
days of computation on our hardware. 

We applied layered labelled propagation and re-compressed 
our graphs (the current version), obtaining a significant im- 
provement. In Table [T] we show the results: we were able to 
reduce the graph size by 30%, which suggests that LLP has 
been able to discover several significant clusters. 

The change in structure can be easily seen from Figure [T] 
where we show the distribution of the binary logarithm of 
gaps between successors for the current fb graph. The 
smaller the gaps, the higher the locality. In the graph with 
renumbered Facebook IDs, the distribution is bimodal: there 



is a local maximum at two, showing that there is some lo- 
cality, but the bulk of the probability mass is around 20-21, 
which is slightly less than the information-theoretical lower 
bound (« 23). 

In the graph permuted with LLP, however, the distribu- 
tion radically changes: it is now (mostly) beautifully mono- 
tonically decreasing, with a very small bump at 23, which 
testifies the existence of a small core of "randomness" in the 
graph that LLP was not able to tame. 

Regarding similarity, we see an analogous phenomenon: 
the number of successors represented by copy has doubled, 
going from 9% to 18%. The last datum is in line with other 
social networks (web graphs, on the contrary, are extremely 
redundant and more than 80% of the successors are usually 
copied). Moreover, disabling copying altogether results in 
modest increase in size (ss 5%), again in line with other so- 
cial networks, which suggests that for most applications it 
is better to disable copying at all to obtain faster random 
access. 

The compression ratio is around 53%, which is similar to 
other similar social networks, such as LiveJournal (55%) or 
DBLP (40%) [20 For other graphs (see Table [TJ, however, 
it is slightly worse. This might be due to several phenomena: 
First, our LLP runs were executed with only half the number 
or clusters, and for each cluster we restricted the number of 
iterations to just four, to make the whole execution of LLP 
feasible. Thus, our runs are capable of finding considerably 
less structure than the runs we had previously performed for 
other networks. Second, the number of nodes is much larger: 
there is some cost in writing down gaps (e.g., using 7, S or 
£ codes) that is dependent on their absolute magnitude, and 
the lower bound does not take into account that cost. 

4.2 Running 

Since most of the graphs, because of their size, had to be ac- 
cessed by memory mapping, we decided to store all counters 
(both those for B{x,r — 1) and those for B(x,r)) in main 
memory, to avoid eccessive I/O. The runs of HyperANF on 
the current whole Facebook graph used 32 registers, so the 
space for counters was about 27GiB (e.g., we could have 
analysed a graph with four times the number of nodes on 
the same hardware). As a rough measure of speed, a run on 
the LLP-compressed current whole Facebook graph requires 
about 13.5 hours. Note that this timings would scale linearly 
with an increase in the number of cores. 

4.3 General comments 

In September 2006, Facebook was opened to non-college stu- 
dents: there was an instant surge in subscriptions, as our 

8 The interested reader will find similar data for several type of net- 
works at the LAW web site (http://law.dsi.unimi.it/). 
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it 


se 


itse 


us 


fb 


Original 
LLP 


14.8 (83%) 
10.3 (58%) 


14.0 (86%) 
10.2 (63%) 


15.0 (82%) 
10.3 (56%) 


17.2 (82%) 
11.6 (56%) 


20.1 (86%) 
12.3 (53%) 



Table 1: The number of bits per link and the compression ratio (with respect to the information-theoretical lower bound) 
for the current graphs in the original order and for the same graphs permuted by layered label propagation 




2 4 6 8 10 



Figure 2: The probability mass functions of the distance 
distributions of the current graphs (truncated at distance 10) . 

data shows. In particular, the it and se subgraphs from 
January 1, 2007 were highly disconnected, as shown by the 
incredibly low percentage of reachable pairs we estimate in 
Table [9] Even Facebook itself was rather disconnected, but 
all the data we compute stabilizes (with small oscillations) 
after 2009, with essentially all pairs reachable. Thus, we con- 
sider the data for 2007 and 2008 useful to observe the evolu- 
tion of Facebook, but we do not consider them representative 
of the underlying human social- link structure. 





it 


se 


itse 


us 


fb 


2007 


1.31 


3.90 


1.50 


119.61 


99.50 


2008 


5.88 


46.09 


36.00 


106.05 


76.15 


2009 


50.82 


69.60 


55.91 


111.78 


88.68 


2010 


122.92 


100.85 


118.54 


128.95 


113.00 


2011 


198.20 


140.55 


187.48 


188.30 


169.03 


current 


226.03 


154.54 


213.30 


213.76 


190.44 



Table 4: Average degree of the datasets. 




2007 2008 2009 2010 2011 curr 

Year 



Figure 3: The average distance graph. See also Table [6] 





it 


se 


itse 


us 


fb 


2007 
2008 


0.04 
25.54 


10.23 
93.90 


0.19 
80.21 


100.00 
99.26 


68.02 
89.04 



Table 9: Percentage of reachable pairs 2007-2008. 



4.4 The distribution 

Figure [2] displays the probability mass functions of the cur- 
rent graphs. We will discuss later the variation of the average 
distance and spid, but qualitatively we can immediately dis- 
tinguish the regional graphs, concentrated around distance 
four, and the whole Facebook graph, concentrated around 
distance five. The distributions of it and se, moreover, have 
significantly less probability mass concentrated on distance 
five than itse and us. The variance data (Table [7] and Fig- 
ure [4]) show that the distribution became quickly extremely 
concentrated. 
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it 


se 


itse 


us 


fb 


OAA7 

2007 
2008 
2009 
2010 
2011 
current 


159. 8K (105. OK) 
335.8 K (987.9 K) 
4.6M (116. 0M) 
11.8 M (726. 9M) 
17.1 M (1.7 G) 
19.8 M (2.2 G) 


11. zK (21. 8K) 
1.0M (23.2M) 
1.6M (55.5 M) 
3.0M (149.9 M) 
4.0 M (278.2 M) 
4.3 M (335.7M) 


172.1 K (128. 8K) 
1.4M (24.3 M) 
6.2M (172.1 M) 

14.8 M (878.4M) 
21.1 M (2.0 G) 
24.1 M (2.6 G) 


O O TV r /ron O TV f \ 

8.8 M (529.3 M) 
20. 1M (1.1G) 
41. 5M (2.3G) 
92.4M (6.0 G) 
131.4M (12.4G) 
149. 1M (15.9 G) 


1 O (111 1 o A A i\ /r \ 

13.0 M (644.6 M) 
56.0 M (2.1 G) 
139.1 M (6.2 G) 
332. 3M (18.8 G) 
562.4 M (47.5 G) 
721.1 M (68.7 G) 



Table 2: Number of nodes and friendship links of the datasets. Note that each friendship link, being undirected, is 
represented by a pair of symmetric arcs. 
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se 


itse 


us 


fb 


2007 


387.0 K 


51. OK 


461. 9K 


1.8 G 


2.3G 


2008 


3.9 M 


96.7M 


107.8 M 


4.0 G 


9.2G 


2009 


477.9 M 


227.5 M 


840.3 M 


9.1 G 


28. 7G 


2010 


3.6 G 


623.0 M 


4.5 G 


26.0 G 


93. 3G 


2011 


8.0 G 


1.1 G 


9.6 G 


53.6 G 


238.1 G 


current 


8.3 G 


1.2G 


9.7G 


68.5 G 


344.9 G 



Table 3: Size in bytes of the datasets. 



Lower bounds from HyperANF runs 





it 


se 


itse 


us 


fb 


2007 


41 


17 


41 


13 


14 


2008 


28 


17 


24 


17 


16 


2009 


21 


16 


17 


16 


15 


2010 


18 


19 


19 


19 


15 


2011 


17 


20 


17 


18 


35 


current 


19 


19 


19 


20 


58 


Exact diameter of the giant component 


current 


25 


23 


27 


30 


41 



Table 10: Lower bounds for the diameter of all graphs, and 
exact values for the giant component (> 99.7%) of current 
graphs computed using the iFUB algorithm. 

4.5 Average degree and density 

Table [4] shows the relatively quick growth in time of the av- 
erage degree of all graphs we consider. The more users join 
the network, the more existing friendship links are uncovered. 
In Figure [6] we show a loglog-scaled plot of the same data: 
with the small set of points at our disposal, it is difficult to 
draw reliable conclusions, but we are not always observing 
the power-law behaviour suggested in |16l : see, for instance, 
the change of the slope for the us graphF] 



9 We remind the reader that on a log-log plot almost anything "looks 
like" a straight line. The quite illuminating examples shown in I17| . in 
particular, show that goodness-of-fit tests are essential. 



The density of the network, on the contrary, decreases Pj 
In Figure [5] we plot the density (number of edges divided 
by number of nodes) of the graphs against the number of 
nodes (see also Table [5| . There is some initial alternating 
behaviour, but on the more complete networks (f b and us) 
the trend in sparsification is very evident. 

Geographical concentration, however, increases density: in 
Figure [5] we can see the lines corresponding to our regional 
graphs clearly ordered by geographical concentration, with 
the f b graph in the lowest position. 

4.6 Average distance 

The results concerning average distanc^] are displayed in 
Figure [| and Table [6] The average distanc^^l on the Face- 

10 We remark that the authors of I16| call densification the increase 
of the average degree, in contrast with established literature in graph 
theory, where density is the fraction of edges with respect to all possi- 
ble edges (e.g., 2m/(n(n — 1))). We use "density", "densification" and 
"sparsification" in the standard sense. 

11 The data we report is about the average distance between reach- 
able pairs, for which the name average connected distance has been 
proposed 5 . This is the same measure as that used by Travers and 
Milgram in |23| . We refrain from using the word "connected" as it 
somehow implies a bidirectional (or, if you prefer, undirected) connec- 
tion. The notion of average distance between all pairs is useless in a 
graph in which not all pairs are reachable, as it is necessarily infinite, 
so no confusion can arise. 

12 In some previous literature (e.g., I16| ). the 90% percentile (possibly 
with some interpolation) of the distance distribution, called effective 
diameter, has been used in place of the average distance. Having at 
our disposal tools that can compute easily the average distance, which 
is a parameterless, standard feature of the distance distribution that 
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it 


se 


itse 


us 


fb 


2007 


8.224F-06 


3.496F-04 


8.692E-06 


1.352F-05 


7.679E-06 


2008 


1.752E-05 


4.586E-05 


2.666E-05 


5.268E-06 


1.359E-06 


2009 


1.113E-05 


4.362E-05 


9.079E-06 


2.691E-06 


6.377E-07 


2010 


1.039E-05 


3.392E-05 


7.998E-06 


1.395E-06 


3.400E-07 


2011 


1.157E-05 


3.551E-05 


8.882E-06 


1.433E-06 


3.006E-07 


current 


1.143E-05 


3.557E-05 


8.834E-06 


1.434E-06 


2.641E-07 



Table 5: Density of the datasets. 





it 


se 


itse 


us 


fb 


2007 
2008 
2009 
2010 
2011 
current 


10.25 (±0.17) 
6.45 (±0.03) 
4.60 (±0.02) 
4.10 (±0.02) 

3.88 (±0.01) 

3.89 (±0.02) 


5.95 (±0.07) 
4.37 (±0.03) 
4.11 (±0.01) 
4.08 (±0.02) 
3.91 (±0.01) 
3.90 (±0.04) 


8.66 (±0.14) 
4.85 (±0.05) 
4.94 (±0.02) 
4.43 (±0.03) 
4.17 (±0.02) 
4.16 (±0.01) 


4.32 (±0.02) 
4.75 (±0.02) 
4.73 (±0.02) 
4.64 (±0.02) 
4.37 (±0.01) 
4.32 (±0.01) 


4.46 (±0.04) 
5.28 (±0.03) 
5.26 (±0.03) 
5.06 (±0.01) 
4.81 (±0.04) 
4.74 (±0.02) 



Table 6: The average distance (± standard error). See also Figure [3] and [7] 

4.7 Spid 



book current graph is 4.74[j£] Moreover, a closer look at the 
distribution shows that 92% of the reachable pairs of individ- 
uals are at distance five or less. 

We note that both on the it and se graphs we find a sig- 
nificantly lower, but similar value. We interpret this result as 
telling us that the average distance is actually dependent on 
the geographical closeness of users, more than on the actual 
size of the network. This is confirmed by the higher average 
distance of the itse graph. 

During the fastest growing years of Facebook our graphs 
show a quick decrease in the average distance, which how- 
ever appears now to be stabilizing. This is not surprising, as 
"shrinking diameter" phenomena are always observed when 
a large network is "uncovered", in the sense that we look at 
larger and larger induced subgraphs of the underlying global 
human network. At the same time, as we already remarked, 
density was going down steadily. We thus see the small-world 
phenomenon fully at work: a smaller fraction of arcs connect- 
ing the users, but nonetheless a lower average distance. 

To make more concrete the "degree of separation" idea, in 
Table we show the percentage of reachable pairs within 
the ceiling of the average distance (note, again, that it is the 
percentage relatively to the reachable pairs): for instance, 
in the current Facebook graph 92% of the pairs of reachable 
users are within distance five — four degrees of separation. 



has been used in social sciences for decades, we prefer to stick to it. 
Experimentally, on web and social graphs the average distance is about 
two thirds of the effective diameter plus one [3]- 

13 Note that both Karinthy and Guare had in mind the maximum, not 
the average number of degrees, so they were actually upper bounding 
the diameter. 



The spid is the index of dispersion a 2 / ' [i (a.k.a. variance-to- 
mean ratio) of the distance distribution. Some of the authors 
proposed the spid [5] as a measure of the "webbiness" of a so- 
cial network. In particular, networks with a spid larger than 
one should be considered "web-like", whereas networks with a 
spid smaller than one should be considered "properly social". 
We recall that a distribution is called under- or over-dispersed 
depending on whether its index of dispersion is smaller or 
larger than 1 (e.g., variance smaller or larger than the aver- 
age distance), so a network is considered properly social or 
not depending on whether its distance distribution is under- 
or over-dispersed. 

The intuition behind the spid is that "properly social" net- 
works strongly favour short connections, whereas in the web 
long connection are not uncommon. As we recalled in the in- 
troduction, the starting point of the paper was the question 
"What is the spid of Facebook"? The answer, confirming the 
data we gathered on different social networks in [3], is shown 
in Table [5J With the exception of the highly disconnected 
regional networks in 2007-2008 (see Table [9| , the spid is well 
below one. 

Interestingly, across our collection of graphs we can confirm 
that there is in general little correlation between the average 
distance and the spid: Kendall's r is —0.0105; graphical ev- 
idence of this fact can be seen in the scatter plot shown in 
Figure [7] 

If we consider points associated with a single network, 
though, there appears to be some correlation between av- 
erage distance and spid, in particular in the more connected 
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it 


se 


itse 


us 


fb 


2007 
2008 
2009 
2010 
2011 
current 


OO A/2 ( 1 1 \C\\ 

32.46 (±1.49) 
3.78 (±0.18) 
0.64 (±0.04) 
0.40 (±0.01) 
0.38 (±0.03) 
0.42 (±0.03) 


3.90 (±0.12) 
0.69 (±0.04) 
0.56 (±0.02) 
0.50 (±0.02) 
0.50 (±0.02) 
0.52 (±0.04) 


16.62 (±0.87) 
1.74 (±0.15) 
0.84 (±0.02) 
0.64 (±0.03) 
0.61 (±0.02) 
0.57 (±0.01) 


0.52 (±0.01) 
0.82 (±0.02) 
0.62 (±0.02) 
0.53 (±0.02) 
0.39 (±0.01) 
0.40 (±0.01) 


0.65 (±0.02) 
0.86 (±0.03) 
0.69 (±0.05) 
0.52 (±0.01) 
0.42 (±0.03) 
0.41 (±0.01) 



Table 7: The variance of the distance distribution (± standard error). See also Figure [4] 





it 


se 


itse 


us 


fb 


2007 
2008 
2009 
2010 
2011 
current 


3.17 (±0.106) 
0.59 (±0.026) 
0.14 (±0.007) 
0.10 (±0.003) 
0.10 (±0.006) 
0.11 (±0.007) 


0.66 (±0.016) 
0.16 (±0.008) 
0.14 (±0.004) 
0.12 (±0.005) 
0.13 (±0.006) 
0.13 (±0.010) 


1.92 (±0.078) 
0.36 (±0.028) 
0.17 (±0.004) 
0.14 (±0.006) 
0.15 (±0.004) 
0.14 (±0.003) 


0.12 (±0.003) 
0.17 (±0.003) 
0.13 (±0.003) 
0.11 (±0.004) 
0.09 (±0.003) 
0.09 (±0.003) 


0.15 (±0.004) 
0.16 (±0.005) 
0.13 (±0.009) 
0.10 (±0.002) 
0.09 (±0.005) 
0.09 (±0.003) 



Table 8: The index of dispersion of distances, a.k.a. spid (± standard error). See also Figure [7] 



networks (the values for Kendall's r are all above 0.6, except 
for se). However, this is just an artifact, as the correlation 
between spid and average distance is inverse (larger average 
distance, smaller spid). What is happening is that in this 
case the variance (see Table [7]) is changing in the same direc- 
tion: smaller average distances (which would imply a larger 
spid) are associated with smaller variances. Figure [8] displays 
the mild correlation between average distance and variance in 
the graphs we analyse: as a network gets tighter, its distance 
distribution also gets more concentrated. 

4.8 Diameter 

Hyper ANF cannot provide exact results about the diameter: 
however, the number of steps of a run is necessarily a lower 
bound for the diameter of the graph (the set of registers can 
stabilize before a number of iterations equal to the diameter 
because of hash collisions, but never after). While there are 
no statistical guarantees on this datum, in Table 10 we re- 
port these maximal observations as lower bounds that differ 
significantly between regional graphs and the overall Face- 
book graph — there are people that are significantly more "far 
apart" in the world than in a single nation p*] 

To corroborate this information, we decided to also ap- 
proach the problem of computing the exact diameter directly, 
although it is in general a daunting task: for very large graphs 
matrix-based algorithms are simply not feasible in space, and 
the basic algorithm running n breadth-first visits is not fea- 
sible in time. We thus implemented a highly parallel version 



14 Incidentally, as we already remarked, this is the measure that 
Karinthy and Guare actually had in mind. 



of the iFUB (iterative Fringe Upper Bound) algorithm intro- 
duced in [5] (extending the ideas of Q15]) for undirected 
graphs. 

The basic idea is as follows: consider some node x, and 
find (by a breadth- first visit) a node y farthest from x. Find 
now a node z farthest from y: d(y, z) is a (usually very good) 
lower bound on the diameter, and actually it is the diameter 
if the graph is a tree (this is the "double sweep" algorithm) . 

We now consider a node c halfway between y and z: such 
a node is "in the middle of the graph" (actually, it would be 
a center if the graph was a tree), so if h is the eccentricy of 
c (the distance of the farthest node from c) we expect 2h to 
be a good upper bound for the diameter. 

If our upper and lower bound match, we are finished. Oth- 
erwise, we consider the fringe: the nodes at distance exactly 
h from c. Clearly, if M is the maximum of the eccentrici- 
ties of the nodes in the fringe, max{ 2(h — 1), M } is a new 
(and hopefully improved) upper bound, and M is a new (and 
hopefully improved) lower bound. We then iterate the pro- 
cess by examining fringes closer to the root until the bounds 
match. 

Our implementation uses a multicore breadth-first visit: 
the queue of nodes at distance d is segmented into small 
blocks handled by each core. At the end of a round, we 
have computed the queue of nodes at distance d + 1. Our 
implementation was able to discover the diameter of the cur- 
rent us graph (which fits into main memory, thanks to LLP 
compression) in about twenty minutes. The diameter of Face- 
book required ten hours of computation of a machine with 
ITiB of RAM (actually, 256GiB would have been sufficient, 
always because of LLP compression) . 
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it 


se 


itse 


us 


fb 


ZULU 


DO70 (11) 


64% (6) 


67% (9) 


95% (5) 


91% (5) 


2008 


77% (7) 


93% (5) 


77% (5) 


83% (5) 


91% (6) 


2009 


90% (5) 


96% (5) 


75% (5) 


86% (5) 


94% (6) 


2010 


98% (5) 


97% (5) 


91% (5) 


91% (5) 


97% (6) 


2011 


90% (4) 


86% (4) 


95% (5) 


97% (5) 


89% (5) 


current 


88% (4) 


86% (4) 


97% (5) 


97% (5) 


91% (5) 



Table 11: Percentage of reachable pairs within the ceiling of the average distance (shown between parentheses). 




Figure 4: The graph of variances of the distance distributions. 
See also Table 

The values reported in Table [10] confirm what we discov- 
ered using the approximate data provided by the length of 
Hyper ANF runs, and suggest that while the distribution has 
a low average distance and it is quite concentrated, there 
are nonetheless (rare) pairs of nodes that are much farther 
apart. We remark that in the case of the current f b graph, 
the diameter of the giant component is actually smaller than 
the bound provided by the Hyper ANF runs, which means 
that long paths appear in small (and likely very irregular) 
components. 

4.9 Precision 

As already discussed in [3], it is very difficult to obtain strong 
theoretical bounds on data derived from the distance distri- 
bution. The problem is that when passing from the neigh- 
bourhood function to the distance distribution, the relative 
error bound becomes an absolute error bound: since the dis- 
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Figure 6: A plot correlating number of nodes to the average 
degree (for the graphs from 2009 on). 
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Figure 7: A 
between the 



scatter plot showing the (lack of) correlation 
average distance and the spid. 




Figure 8: A scatter plot showing the mild correlation between 
the average distance and the variance. 




Figure 9: The evolution of the relative error in a Hyper- 
ANF computation with relative standard deviation 9.25% on 
a small social network (dblp-2010). 

tance distribution attains very small values (in particular in 
its tail) , there is a concrete risk of incurring significant errors 
when computing the average distance or other statistics. On 
the other hand, the distribution of derived data is extremely 
concentrated [3]. 

There is, however, a clear empirical explanation of the un- 
expected accuracy of our results that is evident from an anal- 
ysis of the evolution of the empirical relative error of a run 
on a social network. We show an example in Figure [9] 

• In the very first steps, all counters contain essentially 
disjoint sets; thus, they behave as independent random 
variables, and under this assumption their relative error 
should be significantly smaller than expected: indeed, 
this is clearly visible from Figure [9] 

• In the following few steps, the distribution reaches its 
highest value. The error oscillates, as counters are now 
significantly dependent from one another, but in this 
part the actual value of the distribution is rather large, 
so the absolute theoretical error turns out to be rather 
good. 

• Finally, in the tail each counter contains a very large 
subset of the reachable nodes: as a result, all counters 
behave in a similar manner (as the hash collisions are 
essentially the same for every counter), and the rela- 
tive error stabilises to an almost fixed value. Because 
of this stabilisation, the relative error on the neighbour- 
hood function transfers, in practice, to a relative error 
on the distance distribution. To see why this happen, 
observe the behaviour of the variation of the relative er- 
ror, which is quite erratic initially, but then converges 
quickly to zero. The variation is the only part of the 
relative error that becomes an absolute error when pass- 
ing to the distance distribution, so the computation on 
the tail is much more accurate than what the theoretical 
bound would imply. 
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We remark that our considerations remain valid for any 
diffusion-based algorithm using approximate, statistically de- 
pendent counters (e.g., ANF |21|V 

5 Conclusions 

In this paper we have studied the largest electronic social net- 
work ever created (sa 721 million active Facebook users and 
their sa 69 billion friendship links) from several viewpoints. 

First of all, we have confirmed that layered labelled prop- 
agation [2_ is a powerful paradigm for increasing locality of 
a social network by permuting its nodes. We have been able 
to compress the us graph at 11.6 bits per link — 56% of the 
information-theoretical lower bound, similarly to other, much 
smaller social networks. 

We then analysed using HyperANF the complete Facebook 
graph and 29 other graphs obtained by restricting geographi- 
cally or temporally the links involved. We have in fact carried 
out the largest Milgram-like experiment ever performed. The 
average distance of Facebook is 4.74, that is, 3.74 "degrees of 
separation", prompting the title of this paper. The spid of 
Facebook is 0.09, well below one, as expected for a social 
network. Geographically restricted networks have a smaller 
average distance, as it happened in Milgram's original exper- 
iment. Overall, these results help paint the picture of what 
the Facebook social graph looks like. As expected, it is a 
small- world graph, with short paths between many pairs of 
nodes. However, the high degree of compressibility and the 
study of geographically limited subgraphs show that geog- 
raphy plays a huge role in forming the overall structure of 
network. Indeed, we see in this study, as well as other stud- 
ies of Facebook [1 that, while the world is connected enough 
for short paths to exist between most nodes, there is a high 
degree of locality induced by various externalities, geography 
chief amongst them, all reminiscent of the model proposed in 

When Milgram first published his results, he in fact offered 
two opposing interpretations of what "six degrees of separa- 
tion" actually meant. On the one hand, he observed that 
such a distance is considerably smaller than what one would 
naturally intuit. But at the same time, Milgram noted that 
this result could also be interpreted to mean that people are 
on average six "worlds apart": "When we speak of fiv^£] in- 
termediaries, we are talking about an enormous psychological 
distance between the starting and target points, a distance 
which seems small only because we customarily regard 'five' 
as a small manageable quantity. We should think of the two 
points as being not five persons apart, but 'five circles of ac- 

15 Five is the median of the number of intermediaries reported in the 
first paper by Milgram 1201 . from which our quotation is taken. More 
experiments were performed with Travers 1231 with a slightly greater 
average, as reported in Section |2| 



quaintances' apart — five 'structures' apart." |20| . From this 
gloomier perspective, it is reassuring to see that our findings 
show that people are in fact only four world apart, and not 
six: when considering another person in the world, a friend 
of your friend knows a friend of their friend, on average. 
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